{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "#r \"nuget: TorchSharp-cpu\"\n",
    "\n",
    "open TorchSharp\n",
    "open type TorchSharp.torch\n",
    "open type TorchSharp.TensorExtensionMethods\n",
    "open type TorchSharp.torch.distributions"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Models\n",
    "\n",
    "While doing numerical computing with Torch is useful and productive, modeling is what Torch is all about, whether the host language is Python, C#, or F#. It's its reason for being.\n",
    "\n",
    "In this tutorial, we will introduce the basics of modeling with TorchSharp. The intent is not to teach you how to come up with a good model, or do data science. It's merely to teach the mechanics of model construction and use in TorchSharp.\n",
    "\n",
    "It's important to understand how LibTorch, the engine underneath Pytorch and TorchSharp works -- it's a __dynamic__ model engine, which means that the engine is there to keep track of work that is done from the host language, tracing it and doing the necessary bookeeping to be able to perform backpropagation. This is very different form how early frameworks worked, for example Tensorflow v1.X -- in those frameworks, a model graph was built by the host code, and only once the graph was built could you run the model.\n",
    "\n",
    "The static approach of TF v1.X had some challenges associated with it, one of which was that it was hard to debug when code didn't run eagerly. One of the upsides of the static approach is that it's relatively straight-forward to externalize the whole model: its weights and its logic.\n",
    "\n",
    "That is not so with a dynamic framework -- the weights and the logic are tied, but only loosely. In order to execute the model during training, and after, you need the code for the model, expressed in the host language. Thus, the focus of much of this discussion will be on the model code."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Classes\n",
    "\n",
    "Typically, we want to keep the logic of a model in a class of its own. This makes it easy to use other TorchSharp constructs to combine and run models. Conceptually, that's all a model is -- a Tensor -> Tensor function implemented as a class with a 'forward()' function. This function is where the logic of the model is placed. (If C# supported operator(), like C++ does, that's what we'd use here, instead. Just like Python does.)\n",
    "\n",
    "TorchSharp makes it easy to build models, because you only have to specify the forward function. To do backprogation, you also need the backward function, which supports using the chain rule of calculus to calculate gradients. In Torch, the backward function is automatically implemented as long as the forward function relies only on Torch APIs for computations.\n",
    "\n",
    "Let's start with a super-simple model, which has only one single linear layer that expect a tensor with 1000 input elements and will yield a tensor with 100 elements:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "\n",
    "type Trivial() as this = \n",
    "    inherit nn.Module<Tensor,Tensor>(\"Trivial\")\n",
    "\n",
    "    let lin1 = nn.Linear(1000L, 100L)\n",
    "\n",
    "    do\n",
    "        this.RegisterComponents()\n",
    "\n",
    "    override _.forward(input) = lin1.forward(input)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now invoke the model with some random data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let input = rand(1000)\n",
    "let model = Trivial()\n",
    "model.forward(input)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we say that a model expects a tensor with 1000 inputs, we're actually ignoring that in most scenarios, data is fed into a model in batches. The size of the Linear input is actually just about the last dimension. We can feed tensors of Nx1000 and will get a tensor Nx100 out:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "model.forward(rand(3,1000))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's perfectly possible to have more than one dimension preceding the last, but that is not very common. Batches are usually just presented by adding one first dimension."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "model.forward(rand(2,3,1000))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All TorchSharp operators will accept either batched or non-batched input data, but not all operators will accept more than one preceding dimension. Linear is not unique in this regard, but it's not generally the case."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If all we wanted to use was a single Linear layer, the model class wouldn't strictly speaking be necessary. The real value of the class, which matches most real-world scenarios, is to combine more than one operator in the model, and abstract (hide) it from the outside world. The most simple addition we can make is to add ReLU to the model.\n",
    "\n",
    "Since ReLU, as do all activation functions, does not rely on trainable weights, there is no need to create a module and keep it in a field, we can just invoke it as a function. There is a module version of ReLU, too, which has its use, but in this situation, it's overkill."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "\n",
    "type Trivial() as this = \n",
    "    inherit nn.Module<Tensor,Tensor>(\"Trivial\")\n",
    "\n",
    "    let lin1 = nn.Linear(1000L, 100L)\n",
    "\n",
    "    do\n",
    "        this.RegisterComponents()\n",
    "\n",
    "    override _.forward(input) = \n",
    "    \n",
    "        use x = lin1.forward(input)\n",
    "        nn.functional.relu(x)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With ReLU in place, there will be no negative numbers in the output, as expected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let input = rand(1000)\n",
    "let model = Trivial()\n",
    "model.forward(input)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To make the model slightly more interesting, let's add another layer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "type Trivial() as this = \n",
    "    inherit nn.Module<Tensor,Tensor>(\"Trivial\")\n",
    "\n",
    "    let lin1 = nn.Linear(1000L, 100L)\n",
    "    let lin2 = nn.Linear(100L, 10L)\n",
    "\n",
    "    do\n",
    "        this.RegisterComponents()\n",
    "\n",
    "    override _.forward(input) = \n",
    "    \n",
    "        use x = lin1.forward(input)\n",
    "        use y = nn.functional.relu(x)\n",
    "        lin2.forward(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let model = Trivial()\n",
    "model.forward(input)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With this, we're ready to start training the model. Let's make up some fake (random) data, and then train the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let dataBatch = rand(32,1000)  // Our pretend input data\n",
    "let resultBatch = rand(32,10)  // Our pretend ground truth.\n",
    "dataBatch.ToString()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we need a loss function that compares the output from the model with the ground truth, our labels. Here, we can use mean square error, a.k.a. \"MSE\". The loss produces a scalar value that represents how good the prediction is compared with the truth. Close to 0 is good, far from it is bad. It's a good idea to know how bad the loss is when we start, i.e. when the model is completely random."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let loss x y = nn.functional.mse_loss(x,y)\n",
    "let pred = model.forward(dataBatch)\n",
    "(loss pred resultBatch).item<single>()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training\n",
    "\n",
    "Training the consists of repeatedly computing the loss, then the gradients, then applying the gradient to the model weights before starting over."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let learning_rate = 0.001f.ToScalar()\n",
    "\n",
    "// Compute the loss\n",
    "let pred = model.forward(dataBatch)\n",
    "let output = loss pred resultBatch\n",
    "\n",
    "// Clear the gradients before doing the back-propagation\n",
    "model.zero_grad()\n",
    "\n",
    "// Do back-progatation, which computes all the gradients.\n",
    "output.backward()\n",
    "\n",
    "using(torch.no_grad()) (fun _ -> \n",
    "    for param in model.parameters() do\n",
    "        let grad = param.grad()\n",
    "        match grad with\n",
    "        | null -> ()\n",
    "        | _ -> \n",
    "            let update = grad.mul(learning_rate)\n",
    "            param.sub_(update) |> ignore\n",
    ")\n",
    "(loss pred resultBatch).item<single>()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After training once, we should see a smaller loss. If you keep running the cell above over and over (Ctrl-Enter instead of Shift-Enter), the loss should decrease. Not by very much, perhaps, but it should keep going toward zero."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimizers\n",
    "\n",
    "The last step of the training logic above is called 'optimization,' which is used in its mathematical sense: we're minimizing the loss function, i.e. finding the inputs for which the loss is the smallest. Note that we are not minimizing it with respect to the input data, we're minimizing with respect to the weights.\n",
    "\n",
    "Explicitly writing out the minimization steps like above will get very tedious, and far more complex optimization approaches exist than the simple on above (gradient descent). These are captured in classes appropriately called 'optimizers'. Instead of doing the above, we can use an optimizer and use it to adjust the weights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let learning_rate = 0.001\n",
    "\n",
    "let optimizer = torch.optim.SGD(model.parameters(), learning_rate)\n",
    "\n",
    "// Compute the loss\n",
    "let pred = model.forward(dataBatch)\n",
    "let output = loss pred resultBatch\n",
    "\n",
    "// Clear the gradients before doing the back-propagation\n",
    "model.zero_grad()\n",
    "\n",
    "// Do back-progatation, which computes all the gradients.\n",
    "output.backward()\n",
    "\n",
    "optimizer.step()\n",
    "\n",
    "(loss pred resultBatch).item<single>()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are lots of optimizers available, and it's rarely the case that you can know beforehand which one will be good for your model. How do you measure what is good? \n",
    "\n",
    "What we care about in training is to quickly reach the point where further training is pointless, because the loss no longer goes down. You want to get there spending as little time and resources as possible. One thing you can do is adjust the learning rate, but you may have to choose a more sophisticated optimizer (which do more compute and add training time).\n",
    "\n",
    "In this case, a bigger learning rate will make the loss go down quicker, but that isn't always the case. If the learning rate is too big, you can get in trouble, too. Among other things, the loss may never converge toward its lowest point."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let learning_rate = 0.01\n",
    "\n",
    "let optimizer = torch.optim.SGD(model.parameters(), learning_rate)\n",
    "\n",
    "// Compute the loss\n",
    "let pred = model.forward(dataBatch)\n",
    "let output = loss pred resultBatch\n",
    "\n",
    "// Clear the gradients before doing the back-propagation\n",
    "model.zero_grad()\n",
    "\n",
    "// Do back-progatation, which computes all the gradients.\n",
    "output.backward()\n",
    "\n",
    "optimizer.step()\n",
    "\n",
    "(loss pred resultBatch).item<single>()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using a different optimizer is easy, but not always better. Sometimes, as in this case, a simple optimizer is better."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let learning_rate = 0.01\n",
    "\n",
    "let optimizer = torch.optim.Adam(model.parameters())\n",
    "\n",
    "// Compute the loss\n",
    "let pred = model.forward(dataBatch)\n",
    "let output = loss pred resultBatch\n",
    "\n",
    "// Clear the gradients before doing the back-propagation\n",
    "model.zero_grad()\n",
    "\n",
    "// Do back-progatation, which computes all the gradients.\n",
    "output.backward()\n",
    "\n",
    "optimizer.step()\n",
    "\n",
    "(loss pred resultBatch).item<single>()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Usually, you put the training logic in a loop, iterating for a number of \"epochs\". Play around with the loop counter and see what that does. Even a few thousand iterations is reasonable on a CPU for this simple model. When I ran this for 10,000 iterations, I got a loss that was very close to zero.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let learning_rate = 0.01\n",
    "let model = Trivial()\n",
    "let optimizer = torch.optim.Adam(model.parameters())\n",
    "\n",
    "for epoch = 1 to 500 do\n",
    "    // Compute the loss\n",
    "    let pred = model.forward(dataBatch)\n",
    "    let output = loss pred resultBatch\n",
    "\n",
    "    // Clear the gradients before doing the back-propagation\n",
    "    model.zero_grad()\n",
    "\n",
    "    // Do back-progatation, which computes all the gradients.\n",
    "    output.backward()\n",
    "\n",
    "    optimizer.step() |> ignore\n",
    "\n",
    "(loss pred resultBatch).item<single>()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Accuracy\n",
    "\n",
    "The loss is useful for optimization, but it's a relatively crude indication of how good the model actually is. In fact, it's largely useless for that. What we really care about is how accurate it is at making predictions. For that, we need to count the number of correct predictions and divide by the batch size.\n",
    "\n",
    "To accomplish that, we have to have some idea of how to interpret the output of the model. In this case, each prediction is a ten-element tensor. How do we read that? Let's decide that we will interpret the model as identifying the index with the largest output value. TorchSharp has a function to do just that, which we can use to evaluate the model after it's been trained.\n",
    "\n",
    "For the ground truth, here are the maximums:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let refMax = resultBatch.argmax(1L)\n",
    "refMax"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, we want the output of the model to match this as much as possible. By visual inspection, it looks pretty good."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let predMax = model.forward(dataBatch).argmax(1L);\n",
    "predMax"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's measure how good this really is. When the batch is thousands or millions of records, we obviously cannot visually inspect. This logic does an element-wise comparison, then sums up all the results (true == 1, false == 0) and divides by the number of elements. You can see, that even with a low loss value, the accuracy isn't that impressive, at least not when you run the training loop for 1,000 iterations. Increase to 10,000 and the results are better. When I ran it, the accuracy went up to 1.0, i.e. perfect.\n",
    "\n",
    "There's only so much we can do with random input data and labels; you can't expect that a model will always find interesting patterns in random data. When you have real data, it's different."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "refMax.eq(predMax).sum() / predMax.numel().ToScalar()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To demonstrate that the model isn't actually very good at anything except memorizing its training data, let's see what the accuracy is with a new batch of data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let x = rand(64,1000)\n",
    "let y = rand(64,10)\n",
    "\n",
    "let yMax = y.argmax(1L)\n",
    "let pMax = model.forward(x).argmax(1L)\n",
    "yMax.eq(pMax).sum() / refMax.numel().ToScalar()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I saw between 3-31% accuracy when I ran this. That's not very impressive -- random guessing should yield 10% accuracy on average. Again, though, it's random data, so we can't expect it to be."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Serializing the Weights\n",
    "\n",
    "That is the model creation and training in a nutshell. Once you have the model, you use the same logic for inferencing, i.e. using the model to make predictions.\n",
    "\n",
    "It is, however, extremely rare that you will be using the model in the same process as you use to train it. In fact, the only situation when you do that, is when you use the model to evaluate its accuracy. To use it after it's been trained, in another process, you have to store the weights on disk and then reload them in the new process.\n",
    "\n",
    "TorchSharp makes it easy to store the weights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "model.save(\"tutorial6.model.bin\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To load the weights in a new process, you have to have the model code, create an instance of it with random weights, and then load the saved weights. Let's show how loading the weights makes a difference to the model predictions. Make sure that you save a model with good accuracy -- training for 2500 iterations gave me about 94%."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let model1 = Trivial();\n",
    "let predMax = model1.forward(dataBatch).argmax(1L)\n",
    "refMax.eq(predMax).sum() / predMax.numel().ToScalar()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let model2 = nn.Module.Create<Trivial>(\"tutorial6.model.bin\") :?> Trivial\n",
    "let predMax = model2.forward(dataBatch).argmax(1L)\n",
    "refMax.eq(predMax).sum() / predMax.numel().ToScalar()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A more elegant solution may be to add a constructor that takes a file path to your model. If you do, __make absolutely sure__ that the call to `load` comes after the call to `RegisterComponents`."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There's a relatively detailed article about model serialization at: [Saving and Loading TorchSharp Model Weights](https://github.com/dotnet/TorchSharp/blob/main/docfx/articles/saveload.md), but a couple of things bear repeating here:\n",
    "\n",
    "1. The format used for saving and restoring TorchSharp models is __not__ the same as what Python uses for PyTorch models. The PyTorch format relies on Python pickling, which is a Python-specific serialization format, not compatible with .NET.\n",
    "\n",
    "2. Saving a model means saving its weights only. The model logic is __not__ saved. In order to load the weights, you have to have a model that is exactly like the one that saved the weights. Copy the code, or put it in a separate library that can be loaded by the training and inference code alike.\n",
    "\n",
    "3. You can save a Pytorch model's weights using specific support we added to TorchSharp (see the article mentioned earlier for details). That still means that you have to translate the Python model code into a .NET model that has the same exact structure.\n",
    "\n",
    "4. You cannot save a TorchSharp model and load it in Pytorch.\n",
    "\n",
    "In the future, we are planning to support extracting both the logic and weights, when possible, by relying on ONNX as a language-agnostic whole-model serialization format. ONNX export is already supported by PyTorch, with some limitations."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using Tensorbard\n",
    "\n",
    "Tensorboard is a tool that was originally built for Tensorflow, but it is general enough that it may be used with a number of data sources, including PyTorch and TorchSharp. The TorchSharp support for TB is limited, but it is useful for logging scalars during training, for example by logging accuracy or loss at the end of each epoch.\n",
    "\n",
    "To use Tensorboard, you should create a 'SummaryWriter' instance and use it for logging. If  don't specify a logging directory, one will be created in 'runs' under the current directory. To clear out files, you have to manually delete them using the OS user interface for doing so."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let writer = torch.utils.tensorboard.SummaryWriter(\"runs/trivial\", createRunName=true)\n",
    "\n",
    "let model = Trivial()\n",
    "let optimizer = torch.optim.SGD(model.parameters(), learning_rate)\n",
    "\n",
    "for epoch = 1 to 50 do\n",
    "    for itrt = 1 to 20 do\n",
    "        // Compute the loss\n",
    "        let pred = model.forward(dataBatch)\n",
    "        let output = loss pred resultBatch\n",
    "\n",
    "        // Clear the gradients before doing the back-propagation\n",
    "        model.zero_grad()\n",
    "\n",
    "        // Do back-progatation, which computes all the gradients.\n",
    "        output.backward()\n",
    "\n",
    "        optimizer.step() |> ignore\n",
    "\n",
    "    let pred = model.forward(dataBatch)\n",
    "    let l = (loss pred resultBatch).item<single>()\n",
    "    writer.add_scalar(\"fsharp/loss\", l, epoch)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sequential\n",
    "\n",
    "The prior sections have described the most general way of constructing a model, that is, by creating a class that abstracts the logic of the model and explicitly calls each layer's `forward` methos. While it's not too complicated to do so, it's a lot of \"ceremony\" to accomplish something very regular.\n",
    "\n",
    "Fortunately, for models, or components of models, that simply pass one tensor from layer to layer, there's a class to handle it. It's called `Sequential` and is created by passing a sequence of modules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "fsharp"
    },
    "vscode": {
     "languageId": "polyglot-notebook"
    }
   },
   "outputs": [],
   "source": [
    "let seq = nn.Sequential((\"lin1\", nn.Linear(1000L, 100L) :> nn.Module<Tensor,Tensor>), (\"relu\", nn.ReLU() :> nn.Module<Tensor,Tensor>), (\"lin2\", nn.Linear(100L, 10L) :> nn.Module<Tensor,Tensor>))\n",
    "seq.load(\"tutorial6.model.bin\")\n",
    "predMax = seq.forward(dataBatch).argmax(1L)\n",
    "refMax.eq(predMax).sum() / predMax.numel().ToScalar()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How can that be? Well, it's because save/load doesn't actually save and restore the model, it saves and restores the weights. All it cares about is that the layer's with weights have the same definition, and that they have the same name. In the Trivial case, the names were derived from the field names, and in the Sequential case, they are explicitly given. ReLU doesn't have any weights, so the fact that we did it differently doesn't factor it. The name of the relu layer can be anything, it just has to be something that isn't `null`.\n",
    "\n",
    "About `ReLU` -- in the Sequential case, we implemented that with a layer? That's because Sequential requires that its arguments be subclasses of `Module`, so a function doesn't work."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".NET (F#)",
   "language": "F#",
   "name": ".net-fsharp"
  },
  "language_info": {
   "name": "F#"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
